Evaluating the Effectiveness of Prompts and Responses in Generative Artificial Intelligence Systems

Abstract

Our paper proposes a comprehensive framework to evaluate the effectiveness of prompts and the corresponding responses generated by Generative Artificial Intelligence (GenAI) systems. To do so, our evaluation framework incorporates both objective metrics (accuracy, speed, relevancy, and format) and subjective metrics (coherence, tone, clarity, verbosity, and user satisfaction). A sample evaluation is performed on prompts send to Gemini and ChatGPT GenAI models. Additionally, our evaluation framework employs various feedback mechanisms, such as surveys, expert interviews, and automated reinforcement learning from human feedback (RLHF), to iteratively enhance the performance and reliability of GenAI models. By providing a holistic approach to evaluating and improving prompt-response effectiveness, our evaluation framework contributes to the development of more credible and user-friendly AI systems.

Type
Conference Paper
Publication
International Conference on Computer Applications in Industry and Engineering (CAINE), October 21-22, 2024, San Diego, CA, USA.
Ruida Zeng
Ruida Zeng
Computer Scientist

My interests include AI, distributed computing & blockchains, computer systems security, and applied cryptography.